4.x: Introduce integration test for TLS session tickets #593

Bouncheck · 2025-08-22T17:21:01Z

Adds isolated SessionTicketsIT which checks if session tickets mechanism behaves as expected. It does that by watching Java's SSL debug logs and ensuring that specific substrings appear in them.

Since the server supports tickets with TLSv1.3, only that version is tested. In that version the session resumption without server-side state is done through pre-shared keys. The details are described in rfc8446 (see section 2.2).

Bouncheck · 2025-08-22T17:21:41Z

Tested with Java 8, 11 and 17.
~~Fixes #444~~ . Verifies that the feature is supported. Unfortunately the feature does less than what it was expected to do.

dkropachev

Way to test counters is prone to failure, if connection is re-estalished during the test (I have seen it happening), counters will drift and you going to get false-positive.

To test that TLS tickets work properly you need to establish connection, then break it and then check if ticket was reused, don't see it is happening here.

integration-tests/src/test/java/com/datastax/oss/driver/core/ssl/SessionTicketsIT.java

Bouncheck · 2025-08-22T23:07:17Z

Way to test counters is prone to failure, if connection is re-estalished during the test (I have seen it happening), counters will drift and you going to get false-positive.

I don't think the false positive (test passing when it should not) is possible here, but false negative is.
Are you opposed to all counter/logs based approaches? I can fix the drift by counting the number of seen handshakes (for example by counting ClientHellos) and making other checks depend on that number rather than hardcoded number.

To test that TLS tickets work properly you need to establish connection, then break it and then check if ticket was reused, don't see it is happening here.

There are 5 resumptions and 6 handshakes happening in this test. I can make the checks more explicit by tracking the ticket data from the logs and match it against what is reported to be received through NewSessionTicket messages.

dkropachev · 2025-08-23T05:06:28Z

Way to test counters is prone to failure, if connection is re-estalished during the test (I have seen it happening), counters will drift and you going to get false-positive.

I don't think the false positive (test passing when it should not) is possible here, but false negative is.

https://en.wikipedia.org/wiki/False_positives_and_false_negatives :

A false positive is an error in [binary classification](https://en.wikipedia.org/wiki/Binary_classification) in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test result incorrectly indicates the absence of a condition when it is actually present

Are you opposed to all counter/logs based approaches?

I’m totally fine with it, unless there’s a better way—which there isn’t.

I can fix the drift by counting the number of seen handshakes (for example by counting ClientHellos) and making other checks depend on that number rather than hardcoded number.
There are 5 resumptions and 6 handshakes happening in this test. I can make the checks more explicit by tracking the ticket data from the logs and match it against what is reported to be received through NewSessionTicket messages.

Given the fact that ChannelPool is initialized from one DriverChannel and later create per-shard DriverChannel's, it would be enough to test if Resuming session more than zero.

Bouncheck · 2025-08-26T01:21:26Z

Set protocol version to V4, removed hardcoded expected number of initiated handshakes, removed some counters and made some checks more lenient.
Now this test checks if the resumptions are used at all when configured to use TLS 1.3.

I've also ran some tests with more nodes and higher smp and in general the feature is supported but probably does not work like we would want it to.

The server seems to consistently send two session tickets per connection. Setting higher smp did not seem to impact that. As was mentioned above when channel pool is created it first makes one connection to a node and then tries to add all remaining channels at once. Having a few surplus tickets is not going to help with initialization of all those connections.
Vast majority will not use PSK because there will not be any key available for use.

I have also tried to see what happens if i stop the node and start it again using ccm. Once the node is stopped all channels try to connect again and so far it seems that only one channel tries to use session ticket as if only one ticket was being kept in cache. Other connection attempts report that there is no PSK identity to use for them. In theory there should be around as many unused tickets as there are shards, since each connection receives two tickets. I did not find yet where does this result come from.
But even if there was more tickets, once the first reconnection attempts fail due to node not being back up yet, I don't think those tickets could be reused. It seems to me they are removed from cache once fetched for use, so all tickets would be wasted on attempts before node is back up anyway (if i understand correctly).

So since it does not provide any extra value I did not set higher smp or added more nodes to the test for now.

dkropachev · 2025-08-28T04:05:52Z

@Bouncheck , please update on what is going on.

Bouncheck · 2025-08-28T14:26:20Z

In regard to the test:
I've set the number of shards to 4, however it seems that the more shards is there the more unstable starting node is. Sometimes ccm fails that step.
Added second method doing similar check, but expecting all resumptions to use PSK in order to track this issue. Marked as ignored so that CI is not needlessly red for now.

In regard to the whole issue:
It does seem that on reconnection and also on addMissingChannels multiple threads attempt to resume the same session. Only one succeeds at obtaining the preSharedKey, because once the first thread calls consumePskIdentity others will fail and report "PSK has no identity, or identity was already used".
sessionCache (in sslContext.clientCache) does hold multiple surplus entries. Some have PSK set, some not. On reconnection there is plenty of entries in session cache to choose from. I'm trying to figure out if it's possible to control which session is chosen for resumption.

Adds isolated SessionTicketsIT which checks if session tickets mechanism behaves as expected. It does that by watching Java's SSL debug logs and ensuring that specific substrings appear in them. Since the server supports tickets with TLSv1.3, only that version is tested. In that version the session resumption without server-side state is done through pre-shared keys. The details are described in rfc8446 (see section 2.2).

Bouncheck · 2025-09-03T12:19:53Z

Moved some parts to separate methods to avoid repetition.
Changed the readiness check to also look at the size of the pools.
Marked the reconnection test method as expected to fail, since JDK implementation of TLSv1.3 does not seem to provide what we were looking for. I'll sum up the findings in parent issue.

The CI failure does not look to be related. This test class does not even target Cassandra and the failure is caused by the cluster version reported by the node. It contains -SNAPSHOT label for some reason.

Bouncheck self-assigned this Aug 22, 2025

Bouncheck requested a review from dkropachev August 22, 2025 17:26

Bouncheck changed the title ~~Introduce integration test for TLS session tickets~~ 4.x: Introduce integration test for TLS session tickets Aug 22, 2025

dkropachev requested changes Aug 22, 2025

View reviewed changes

integration-tests/src/test/java/com/datastax/oss/driver/core/ssl/SessionTicketsIT.java Outdated Show resolved Hide resolved

Bouncheck force-pushed the scylla-4.x-integration-tls13 branch from 3e29572 to 2527f1e Compare August 26, 2025 00:52

Bouncheck force-pushed the scylla-4.x-integration-tls13 branch 4 times, most recently from 7ca6356 to 4fd48e0 Compare August 28, 2025 11:18

Bouncheck force-pushed the scylla-4.x-integration-tls13 branch from 4fd48e0 to 9effa23 Compare September 3, 2025 11:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

4.x: Introduce integration test for TLS session tickets #593

4.x: Introduce integration test for TLS session tickets #593

Uh oh!

Bouncheck commented Aug 22, 2025

Uh oh!

Bouncheck commented Aug 22, 2025 •

edited

Loading

Uh oh!

dkropachev left a comment

Uh oh!

Uh oh!

Bouncheck commented Aug 22, 2025 •

edited

Loading

Uh oh!

dkropachev commented Aug 23, 2025

Uh oh!

Bouncheck commented Aug 26, 2025

Uh oh!

dkropachev commented Aug 28, 2025

Uh oh!

Bouncheck commented Aug 28, 2025 •

edited

Loading

Uh oh!

Bouncheck commented Sep 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

4.x: Introduce integration test for TLS session tickets #593

Are you sure you want to change the base?

4.x: Introduce integration test for TLS session tickets #593

Uh oh!

Conversation

Bouncheck commented Aug 22, 2025

Uh oh!

Bouncheck commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkropachev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Bouncheck commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkropachev commented Aug 23, 2025

Uh oh!

Bouncheck commented Aug 26, 2025

Uh oh!

dkropachev commented Aug 28, 2025

Uh oh!

Bouncheck commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bouncheck commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Bouncheck commented Aug 22, 2025 •

edited

Loading

Bouncheck commented Aug 22, 2025 •

edited

Loading

Bouncheck commented Aug 28, 2025 •

edited

Loading

Bouncheck commented Sep 3, 2025 •

edited

Loading